(Artificial) Neural Networks (ANN)


By Iljeok Kim and Sooyoung Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Recall Supervised Learning Setup¶

Perceptron


XOR Problem

  • Minsky-Papert Controversy on XOR
    • not linearly separable
    • limitation of perceptron
$x_1$ $x_2$ $x_1$ XOR $x_2$
0 0 0
0 1 1
1 0 1
1 1 0



2. From Perceptron to Multi-Layer Perceptron (MLP)¶

2.1. Perceptron for $h_{\omega}(x)$¶

  • Neurons compute the weighted sum of their inputs

  • A neuron is activated or fired when the sum $a$ is positive


$$ \begin{align*} a &= \omega_0 + \omega_1 x_1 + \omega_2 x_2 \\ \\ \hat{y} &= g(a) = \begin{cases} 1 & a > 0\\ 0 & \text{otherwise} \end{cases} \end{align*} $$



  • A step function is not differentiable


  • One layer is often not enough
    • One hyperplane

2.2. Multi-layer Perceptron = Artificial Neural Networks (ANN)¶

Multi-neurons



Differentiable activation function




In a compact representation




Multi-layer perceptron


2.3. Another Perspective: ANN as Kernel Learning¶

We can represent this ā€œneuronā€ as follows:

  • The main weakness of linear predictors is their lack of capacity. For classification, the populations have to be linearly separable.

  • The XOR example can be solved by pre-processing the data to make the two populations linearly separable.


Universal function approximator Universal function classifier

Parameterized


Example: Linear Classifier

  • Perceptron tries to separate the two classes of data by dividing them with a line


Example: Neural Networks

  • The hidden layer learns a representation so that the data is linearly separable


colah's blog

3. Logistic Regression¶

3.1. Logistic Regression with TensorFlow¶

$$y^{(i)} \in \{1,0\}$$
InĀ [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import time
%matplotlib inline
InĀ [2]:
#training data gerneration
m = 1000
x1 = 8*np.random.rand(m, 1)
x2 = 7*np.random.rand(m, 1) - 4

g = 0.8*x1 + x2 - 3

C1 = np.where(g >= 0)[0]
C0 = np.where(g < 0)[0]
N = C1.shape[0]
M = C0.shape[0]
m = N + M

X1 = np.hstack([np.ones([N,1]), x1[C1], x2[C1]])
X0 = np.hstack([np.ones([M,1]), x1[C0], x2[C0]])

train_X = np.vstack([X1, X0])
train_y = np.vstack([np.ones([N,1]), -np.ones([M,1])])

train_X = np.asmatrix(train_X)
train_y = np.asmatrix(train_y)

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.show()
InĀ [3]:
train_y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
train_y = np.asmatrix(train_y)
InĀ [4]:
import tensorflow as tf

LR = 0.05
n_iter = 10000

x = tf.placeholder(tf.float32, [None, 3])
y = tf.placeholder(tf.float32, [None, 1])

w = tf.Variable(tf.random_normal([3,1]))

# y_pred넼 źµ¬ķ•˜ģ—¬ģ•¼ 함
y_pred = tf.matmul(?,?)


loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = y_pred, labels = y)
loss = tf.reduce_mean(loss)

optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)
init = tf.global_variables_initializer()
WARNING: Logging before flag parsing goes to stderr.
W0103 23:41:09.187548 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\python\ops\nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
InĀ [5]:
# x와 y에 ķ•™ģŠµķ•  ė°ģ“ķ„°ė„¼ ģ£¼ģž…ķ•“ģ•¼ķ•Ø
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_iter):                                                                         
        sess.run(optm, feed_dict = {x: ?, y: ?})          
    
    w_hat = sess.run(w)
InĀ [6]:
x1p = np.arange(0, 8, 0.01).reshape(-1, 1)
x2p = - w_hat[1,0]/w_hat[2,0]*x1p - w_hat[0,0]/w_hat[2,0]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'g', linewidth = 3, label = '')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.show()

3.2. Logistic Regression in a Form of Neural Network¶



$$y = \sigma \,(\omega_0 + \omega_1 x_1 + \omega_2 x_2)$$




InĀ [7]:
# define input and output size

n_input = 3
n_output = 1
InĀ [8]:
# define weights as a dictionary 

weights = {
    'output' : tf.Variable(tf.random_normal([n_input, n_output], stddev = 0.1))
}
InĀ [9]:
# define placeholders for train_x and train_y

x = tf.placeholder(tf.float32, [None, ?])
y = tf.placeholder(tf.float32, [None, ?])
InĀ [10]:
# define network architecture

def build_model(x, weights):   
    output = tf.matmul(x, weights['output'])    
    return output
InĀ [11]:
# define loss

pred = build_model(x, weights)
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = pred, labels = y)
loss = tf.reduce_mean(loss)
InĀ [12]:
LR = 0.05
optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)
InĀ [13]:
n_batch = 50     # Batch size
n_iter = 10000   # Learning iteration
n_prt = 250      # Print cycle

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

# training or learning

loss_record = []
for epoch in range(n_iter):
    sess.run(optm, feed_dict = {x: train_X,  y: train_y})    
    if epoch % n_prt == 0:
        loss_record.append(sess.run(loss, feed_dict = {x: train_X,  y: train_y}))
        
w_hat = sess.run(weights['output'])
InĀ [14]:
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record))*n_prt, loss_record)
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.show()
InĀ [15]:
x1p = np.arange(0, 8, 0.01).reshape(-1, 1)
x2p = - w_hat[1,0]/w_hat[2,0]*x1p - w_hat[0,0]/w_hat[2,0]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'g', linewidth = 3, label = '')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.show()

Weights and Bias

  • In a neural network, weights and biases are typically separated.



$$ \begin{align*} y_j &= \left(\sum\limits_i \omega_{ij}x_i\right) + b_j\\ y &= \omega^T \mathcal{x} + \mathcal{b} \end{align*} $$



$$y = \sigma \,(b + \omega_1 x_1 + \omega_2 x_2)$$





InĀ [16]:
n_input = 2
n_output = 1
InĀ [17]:
train_X = train_X[:,1:3]
InĀ [18]:
# define network

def build_model(x, weights, biases):   
    output = tf.add(tf.matmul(x, weights['output']), biases['output'])
    return output
InĀ [19]:
weights = {
    'output' : tf.Variable(tf.random_normal([n_input, n_output], stddev = 0.1))
}

biases = {
    'output' : tf.Variable(tf.random_normal([n_output], stddev = 0.1))
}

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])

# ģ˜ˆģø”ź°’ź³¼ ģ •ė‹µģ˜ ģ°Øģ“ė„¼ 비교핓야함
pred = build_model(x, weights, biases)
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=?, labels=?)
loss = tf.reduce_mean(loss)

LR = 0.05
optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

n_batch = 50     
n_iter = 15000   
n_prt = 250      

loss_record = []
for epoch in range(n_iter):
    sess.run(optm, feed_dict = {x: train_X,  y: train_y})     
    if epoch % n_prt == 0:
        loss_record.append(sess.run(loss, feed_dict = {x: train_X,  y: train_y}))
        
w_hat = sess.run(weights['output'])
b_hat = sess.run(biases['output'])

plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record))*n_prt, loss_record)
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.show()
InĀ [20]:
x1p = np.arange(0, 8, 0.01).reshape(-1, 1)
x2p = - w_hat[0,0]/w_hat[1,0]*x1p - b_hat[0]/w_hat[1,0]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'g', linewidth = 3, label = '')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.show()

One-hot Encoding

  • One-hot encoding is a conventional practice for a multi-class classification


$$y^{(i)} \in \{1,0\} \quad \implies \quad y^{(i)} \in \{[0,1],[1,0]\}$$

  • tf.nn.sigmoid_cross_entropy_with_logits $\rightarrow$ tf.nn.softmax_cross_entropy_with_logits
InĀ [21]:
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(handle_unknown='ignore')
train_y = ohe.fit_transform(train_y).toarray()
print(train_y)
[[0. 1.]
 [0. 1.]
 [0. 1.]
 ...
 [1. 0.]
 [1. 0.]
 [1. 0.]]




InĀ [22]:
# output nodeģ˜ ź°œģˆ˜ź°€ 변함
n_input = 2
n_output = ?
InĀ [23]:
weights = {
    'output' : tf.Variable(tf.random_normal([n_input, n_output], stddev = 0.1))
}

biases = {
    'output' : tf.Variable(tf.random_normal([n_output], stddev = 0.1))
}

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])

pred = build_model(x, weights, biases)
loss = tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)
loss = tf.reduce_mean(loss)

LR = 0.05
optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

n_batch = 50     
n_iter = 10000   
n_prt = 250      

loss_record = []
for epoch in range(n_iter):
    sess.run(optm, feed_dict = {x: train_X,  y: train_y})     
    if epoch % n_prt == 0:
        loss_record.append(sess.run(loss, feed_dict = {x: train_X,  y: train_y}))
        
w_hat = sess.run(weights['output'])
b_hat = sess.run(biases['output'])

plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record))*n_prt, loss_record)
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.show()
W0103 23:46:02.156758 41440 deprecation.py:323] From <ipython-input-23-d1c2de8171cd>:13: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

InĀ [24]:
print(w_hat)

x1p = np.arange(0, 8, 0.01).reshape(-1, 1)
x2p = - w_hat[0,0]/w_hat[1,0]*x1p - b_hat[0]/w_hat[1,0]
x3p = - w_hat[0,1]/w_hat[1,1]*x1p - b_hat[1]/w_hat[1,1]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'k', linewidth = 3, label = '')
plt.plot(x1p, x3p, 'g', linewidth = 3, label = '')
plt.xlim([0, 8])
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 12)
plt.show()
[[-1.5832131  1.5183021]
 [-2.0766377  1.9309301]]

4. Looking at Parameters¶

  • To understand network's behavior

4.1. Multi-Layers¶

InĀ [25]:
# training data gerneration

m = 1000
x1 = 10*np.random.rand(m, 1) - 5
x2 = 8*np.random.rand(m, 1) - 4

g = - 0.5*(x1-1)**2 + 2*x2 + 5

C1 = np.where(g >= 0)[0]
C0 = np.where(g < 0)[0]
N = C1.shape[0]
M = C0.shape[0]
m = N + M

X1 = np.hstack([x1[C1], x2[C1]])
X0 = np.hstack([x1[C0], x2[C0]])

train_X = np.vstack([X1, X0])
train_X = np.asmatrix(train_X)

train_y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
ohe = OneHotEncoder(handle_unknown='ignore')
train_y = ohe.fit_transform(train_y).toarray()

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.xlim([-5, 5])
plt.ylim([-4, 4])
plt.show()




InĀ [26]:
# nodeģ˜ 개수
n_input = ?
n_hidden = ?
n_output = ?
InĀ [27]:
# weights (ķ™”ģ‚“ķ‘œ)ģ˜ 개수
weights = {
    'hidden' : tf.Variable(tf.random_normal([n_input, ?], stddev = 0.1)),
    'output' : tf.Variable(tf.random_normal([?, n_output], stddev = 0.1))
}

biases = {
    'hidden' : tf.Variable(tf.random_normal([n_hidden], stddev = 0.1)),
    'output' : tf.Variable(tf.random_normal([n_output], stddev = 0.1))
}

x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])



InĀ [28]:
def build_model(x, weights, biases):
    hidden = tf.add(tf.matmul(x, weights['hidden']), biases['hidden'])
    hidden = tf.nn.sigmoid(hidden)
    
    output = tf.add(tf.matmul(hidden, weights['output']), biases['output'])  
    return output
InĀ [29]:
# ģ˜ˆģø”ź°’ź³¼ ģ •ė‹µģ„ 비교
pred = build_model(x, weights, biases)
loss = tf.nn.softmax_cross_entropy_with_logits(logits = ?, labels = ?)
loss = tf.reduce_mean(loss)

LR = 0.01
optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)

sess = tf.Session()

init = tf.global_variables_initializer()
sess.run(init)

n_batch = 50     
n_iter = 30000   
n_prt = 250      


# ķ•™ģŠµķ•  ė°ģ“ķ„° ģ£¼ģž…
loss_record = []
for epoch in range(n_iter):
    sess.run(optm, feed_dict = {x: ?,  y: ?})     
    if epoch % n_prt == 0:
        loss_record.append(sess.run(loss, feed_dict = {x: ?,  y: ?}))
        
w_hat = sess.run(weights)
b_hat = sess.run(biases)

plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record))*n_prt, loss_record)
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.show()
InĀ [30]:
H = train_X*w_hat['hidden'] + b_hat['hidden']
H = 1/(1 + np.exp(-H))
InĀ [31]:
plt.figure(figsize=(10, 8))
plt.plot(H[0:N,0], H[0:N,1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(H[N:m,0], H[N:m,1], 'bo', alpha = 0.4, label = 'C0')
plt.xlabel('$z_1$', fontsize = 15)
plt.ylabel('$z_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.axis('equal')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.show()
InĀ [32]:
x1p = np.arange(0, 1, 0.01).reshape(-1, 1)
x2p = - w_hat['output'][0,0]/w_hat['output'][1,0]*x1p - b_hat['output'][0]/w_hat['output'][1,0]
x3p = - w_hat['output'][0,1]/w_hat['output'][1,1]*x1p - b_hat['output'][1]/w_hat['output'][1,1]

plt.figure(figsize=(10, 8))
plt.plot(H[0:N,0], H[0:N,1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(H[N:m,0], H[N:m,1], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'k', linewidth = 3, label = '')
plt.plot(x1p, x3p, 'g', linewidth = 3, label = '')
plt.xlabel('$z_1$', fontsize = 15)
plt.ylabel('$z_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.axis('equal')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.show()
InĀ [33]:
x1p = np.arange(-5, 5, 0.01).reshape(-1, 1)
x2p = - w_hat['hidden'][0,0]/w_hat['hidden'][1,0]*x1p - b_hat['hidden'][0]/w_hat['hidden'][1,0]
x3p = - w_hat['hidden'][0,1]/w_hat['hidden'][1,1]*x1p - b_hat['hidden'][1]/w_hat['hidden'][1,1]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'k', linewidth = 3, label = '')
plt.plot(x1p, x3p, 'g', linewidth = 3, label = '')
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-4, 4])
plt.show()

4.2. Multi-Neurons¶

InĀ [34]:
# training data gerneration

m = 1000
x1 = 10*np.random.rand(m, 1) - 5
x2 = 8*np.random.rand(m, 1) - 4

g = - 0.5*(x1*x2-1)**2 + 2*x2 + 5

C1 = np.where(g >= 0)[0]
C0 = np.where(g < 0)[0]
N = C1.shape[0]
M = C0.shape[0]
m = N + M

X1 = np.hstack([x1[C1], x2[C1]])
X0 = np.hstack([x1[C0], x2[C0]])

train_X = np.vstack([X1, X0])
train_X = np.asmatrix(train_X)

train_y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
ohe = OneHotEncoder(handle_unknown='ignore')
train_y = ohe.fit_transform(train_y).toarray()

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.xlim([-5, 5])
plt.ylim([-4, 4])
plt.show()




InĀ [35]:
n_input = ?
n_hidden = ?
n_output = ?
InĀ [36]:
def build_model(x, weights, biases):    
    hidden = tf.add(tf.matmul(x, weights['hidden']), biases['hidden'])    
    hidden = tf.nn.sigmoid(hidden)
    
    output = tf.add(tf.matmul(hidden, weights['output']), biases['output'])    
    return output
InĀ [37]:
weights = {
    'hidden' : tf.Variable(tf.random_normal([?, ?], stddev = 0.1)),
    'output' : tf.Variable(tf.random_normal([?, ?], stddev = 0.1))
}

biases = {
    'hidden' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'output' : tf.Variable(tf.random_normal([?], stddev = 0.1))
}

x = tf.placeholder(tf.float32, [None, ?])
y = tf.placeholder(tf.float32, [None, ?])
InĀ [38]:
pred = build_model(?, ?, ?)
loss = tf.nn.softmax_cross_entropy_with_logits(logits = ?, labels = ?)
loss = tf.reduce_mean(?)

LR = 0.01
optm = tf.train.GradientDescentOptimizer(LR).minimize(loss)

sess = tf.Session()

init = tf.global_variables_initializer()
sess.run(init)

n_batch = 50     
n_iter = 40000   
n_prt = 250      

# Training cycle
loss_record = []
for epoch in range(n_iter):
    sess.run(optm, feed_dict = {x: ?,  y: ?})     
    if epoch % n_prt == 0:
        loss_record.append(sess.run(loss, feed_dict = {x: ?,  y: ?}))
        
w_hat = sess.run(weights)
b_hat = sess.run(biases)

# plots
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record))*n_prt, loss_record)
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.show()
InĀ [39]:
x1p = np.arange(-5, 5, 0.01).reshape(-1, 1)
x2p = - w_hat['hidden'][0,0]/w_hat['hidden'][1,0]*x1p - b_hat['hidden'][0]/w_hat['hidden'][1,0]
x3p = - w_hat['hidden'][0,1]/w_hat['hidden'][1,1]*x1p - b_hat['hidden'][1]/w_hat['hidden'][1,1]
x4p = - w_hat['hidden'][0,2]/w_hat['hidden'][1,2]*x1p - b_hat['hidden'][2]/w_hat['hidden'][1,2]
x5p = - w_hat['hidden'][0,3]/w_hat['hidden'][1,3]*x1p - b_hat['hidden'][3]/w_hat['hidden'][1,3]

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.plot(x1p, x2p, 'k', linewidth = 3, label = '')
plt.plot(x1p, x3p, 'g', linewidth = 3, label = '')
plt.plot(x1p, x4p, 'm', linewidth = 3, label = '')
plt.plot(x1p, x5p, 'c', linewidth = 3, label = '')
plt.xlabel('$x_1$', fontsize = 15)
plt.xlabel('$x_1$', fontsize = 15)
plt.ylabel('$x_2$', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-4, 4])
plt.show()

5. Artificial Neural Networks¶

  • Complex/Nonlinear universal function approximator
    • Linearly connected networks
    • Simple nonlinear neurons
  • Hidden layers
    • Autonomous feature learning



5.1. Recursive Algorithm¶

  • One of the central ideas of computer science

  • Depends on solutions to smaller instances of the same problem ( = subproblem)

  • Function to call itself (it is impossible in the real world)



  • Factorial example


$$n ! = n \cdot (n-1) \cdots 2 \cdot 1$$

InĀ [40]:
n = 5

m = 1
for i in range(n):
    m = m*(i+1)
    
print(m)
120
InĀ [41]:
def fac(n):
    if n == 1:
        return 1
    else:
        return n*fac(n-1)    
InĀ [42]:
# recursive

fac(5)
Out[42]:
120

5.2. Dynamic Programming¶

  • Dynamic Programming: general, powerful algorithm design technique

  • Fibonacci numbers:

InĀ [43]:
# naive Fibonacci

# write a function called fib
InĀ [44]:
fib(10)
Out[44]:
55
InĀ [45]:
# Memorized DP Fibonacci

def mfib(n):
    global memo
        
    if memo[n-1] != 0:
        return memo[n-1]
    elif n <= 2:
        memo[n-1] = 1
        return memo[n-1]
    else:
        memo[n-1] = mfib(n-1) + mfib(n-2)
        return memo[n-1]
InĀ [46]:
import numpy as np

n = 10
memo = np.zeros(n)
mfib(n)
Out[46]:
55.0
InĀ [47]:
n = 30
%timeit fib(30)
159 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
InĀ [48]:
memo = np.zeros(n)
%timeit mfib(30)
422 ns ± 0.911 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

5.3. Training Neural Networks¶

$=$ Learning or estimating weights and biases of multi-layer perceptron from training data

5.3.1. Optimization¶

3 key components

  1. objective function $f(\cdot)$
  2. decision variable or unknown $\omega$
  3. constraints $g(\cdot)$

In mathematical expression



$$\begin{align*} \min_{\omega} \quad &f(\omega) \end{align*} $$

5.3.2. Loss Function¶

  • Measures error between target values and predictions


$$ \min_{\omega} \sum_{i=1}^{m}\ell\left( h_{\omega}\left(x^{(i)}\right),y^{(i)}\right)$$

  • Example
    • Squared loss (for regression): $$ \frac{1}{m} \sum_{i=1}^{m} \left(h_{\omega}\left(x^{(i)}\right) - y^{(i)}\right)^2 $$
    • Cross entropy (for classification): $$ -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log\left(h_{\omega}\left(x^{(i)}\right)\right) + \left(1-y^{(i)}\right)\log\left(1-h_{\omega}\left(x^{(i)}\right)\right)$$

5.3.3. Learning¶

Learning weights and biases from data using gradient descent


$$\omega \Leftarrow \omega - \alpha \nabla_{\omega} \ell \left( h_{\omega}\left(x^{(i)}\right), y^{(i)} \right)$$
  • $\frac{\partial \ell}{\partial \omega}$: too many computations are required for all $\omega$
  • Structural constraints of NN:
    • Composition of functions
    • Chain rule
    • Dynamic programming


Backpropagation

  • Forward propagation
    • the initial information propagates up to the hidden units at each layer and finally produces output
  • Backpropagation
    • allows the information from the cost to flow backwards through the network in order to compute the gradients
  • Chain Rule

    • Computing the derivative of the composition of functions

      • $\space f(g(x))' = f'(g(x))g'(x)$

      • $\space {dz \over dx} = {dz \over dy} \bullet {dy \over dx}$

      • $\space {dz \over dw} = ({dz \over dy} \bullet {dy \over dx}) \bullet {dx \over dw}$

      • $\space {dz \over du} = ({dz \over dy} \bullet {dy \over dx} \bullet {dx \over dw}) \bullet {dw \over du}$

  • Backpropagation

    • Update weights recursively with memory

Optimization procedure


  • It is not easy to numerically compute gradients in network in general.
    • The good news: people have already done all the "hardwork" of developing numerical solvers (or libraries)
    • There are a wide range of tools: TensorFlow

Summary

  • Learning weights and biases from data using gradient descent


6. ANN with MNIST¶

6.1. What's an MNIST?¶

From Wikipedia

  • The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, NIST's complete dataset was too hard.
  • MNIST (Mixed National Institute of Standards and Technology database) database
    • Handwritten digit database
    • $28 \times 28$ gray scaled image
    • Flattened matrix into a vector of $28 \times 28 = 784$



More here

We will be using MNIST to create a Multinomial Classifier that can detect if the MNIST image shown is a member of class 0,1,2,3,4,5,6,7,8 or 9. Susinctly, we're teaching a computer to recognize hand written digets.

InĀ [Ā ]:
# import os
# os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"]="0"
InĀ [49]:
# Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

Let's download and load the dataset.

InĀ [50]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
W0104 01:12:39.512527 41440 deprecation.py:323] From <ipython-input-50-8bf8ae5a5303>:2: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
W0104 01:12:39.512527 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
W0104 01:12:39.513525 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
W0104 01:12:39.701000 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
W0104 01:12:39.702995 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
W0104 01:12:39.740895 41440 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
InĀ [51]:
print ("The training data set is:\n")
print (mnist.train.images.shape)
print (mnist.train.labels.shape)
The training data set is:

(55000, 784)
(55000, 10)
InĀ [52]:
print ("The test data set is:")
print (mnist.test.images.shape)
print (mnist.test.labels.shape)
The test data set is:
(10000, 784)
(10000, 10)

Display a few random samples from it:

InĀ [53]:
mnist.train.images[5]
Out[53]:
array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.23137257, 0.6392157 , 0.9960785 , 0.9960785 , 0.9960785 ,
       0.7607844 , 0.43921572, 0.07058824, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.01568628, 0.5176471 , 0.93725497, 0.9921569 ,
       0.9921569 , 0.9921569 , 0.9921569 , 0.9960785 , 0.9921569 ,
       0.627451  , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.5372549 ,
       0.9921569 , 0.9960785 , 0.9921569 , 0.9921569 , 0.9921569 ,
       0.75294125, 0.9960785 , 0.9921569 , 0.8980393 , 0.0509804 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.01568628, 0.5372549 , 0.9843138 , 0.9921569 , 0.9568628 ,
       0.50980395, 0.19215688, 0.07450981, 0.01960784, 0.6392157 ,
       0.9921569 , 0.8235295 , 0.03529412, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.37254903, 0.9921569 ,
       0.9921569 , 0.8431373 , 0.1764706 , 0.        , 0.        ,
       0.        , 0.        , 0.6117647 , 0.9921569 , 0.68235296,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.8431373 , 0.9960785 , 0.8117648 , 0.09019608,
       0.        , 0.        , 0.        , 0.03921569, 0.3803922 ,
       0.85098046, 0.9176471 , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.83921576,
       0.9921569 , 0.2784314 , 0.        , 0.        , 0.00784314,
       0.19607845, 0.8352942 , 0.9921569 , 0.9960785 , 0.7058824 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.83921576, 0.9921569 , 0.19215688,
       0.        , 0.        , 0.19607845, 0.9921569 , 0.9921569 ,
       0.9921569 , 0.7176471 , 0.04705883, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.7803922 , 0.9921569 , 0.95294124, 0.7686275 , 0.62352943,
       0.95294124, 0.9921569 , 0.9686275 , 0.5411765 , 0.03137255,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.16470589, 0.9921569 ,
       0.9921569 , 0.9921569 , 0.9960785 , 0.9921569 , 0.9921569 ,
       0.39607847, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.23137257, 0.58431375, 0.9960785 , 0.9960785 , 0.9960785 ,
       1.        , 0.9960785 , 0.6862745 , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.13333334, 0.75294125, 0.9960785 , 0.9921569 ,
       0.9921569 , 0.9921569 , 0.7843138 , 0.53333336, 0.89019614,
       0.9450981 , 0.27058825, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.33333334, 0.9686275 ,
       0.9921569 , 0.9960785 , 0.9921569 , 0.77647066, 0.48235297,
       0.07058824, 0.        , 0.19607845, 0.9921569 , 0.8352942 ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.2784314 , 0.9686275 , 0.9921569 , 0.9294118 , 0.75294125,
       0.2784314 , 0.02352941, 0.        , 0.        , 0.        ,
       0.00784314, 0.5019608 , 0.9803922 , 0.21176472, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.46274513, 0.9921569 ,
       0.8705883 , 0.14117648, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.03137255, 0.7176471 ,
       0.9921569 , 0.227451  , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.46274513, 0.9960785 , 0.54509807, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.05490196, 0.7294118 , 0.9960785 , 0.9960785 , 0.227451  ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.2784314 ,
       0.9686275 , 0.9686275 , 0.54509807, 0.0627451 , 0.        ,
       0.        , 0.07450981, 0.227451  , 0.87843144, 0.9921569 ,
       0.9921569 , 0.8313726 , 0.03529412, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.42352945, 0.9921569 ,
       0.9921569 , 0.92549026, 0.6862745 , 0.6862745 , 0.9686275 ,
       0.9921569 , 0.9960785 , 0.9921569 , 0.77647066, 0.16862746,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.2627451 , 0.8352942 , 0.8980393 , 0.9960785 ,
       0.9921569 , 0.9921569 , 0.9921569 , 0.9921569 , 0.83921576,
       0.48627454, 0.02352941, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.09019608, 0.60784316, 0.60784316, 0.8745099 ,
       0.7843138 , 0.46274513, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        ], dtype=float32)
InĀ [54]:
# well, that's not a picture (or image), it's an array.

mnist.train.images[5].shape
Out[54]:
(784,)

You might think the training set is made up of 28 $\times$28 grayscale images of handwritten digits. No !!!

The thing is, the image has been flattened. These are 28x28 images that have been flattened into a 1D array. Let's reshape one.

InĀ [55]:
img = np.reshape(mnist.train.images[5], [28,28])
InĀ [56]:
img = mnist.train.images[5].reshape([28,28])
InĀ [57]:
# So now we have a 28x28 matrix, where each element is an intensity level from 0 to 1.  
img.shape
Out[57]:
(28, 28)

Let's visualize what some of these images and their corresponding training labels look like.

InĀ [58]:
plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
InĀ [59]:
mnist.train.labels[5]
Out[59]:
array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0.])
InĀ [60]:
np.argmax(mnist.train.labels[5])
Out[60]:
8

Batch maker embedded

InĀ [61]:
x, y = mnist.train.next_batch(3)

print(x.shape)
print(y.shape)
(3, 784)
(3, 10)

6.2. ANN with TensorFlow¶

  • Feed a gray image to ANN


  • Our network model



- Network training (learning) $$\omega:= \omega - \alpha \nabla_{\omega} \left( h_{\omega} \left(x^{(i)}\right),y^{(i)}\right)$$

6.2.1. Import Library¶

InĀ [62]:
# Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

6.2.2. Load MNIST Data¶

  • Download MNIST data from tensorflow tutorial example
InĀ [63]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
InĀ [64]:
train_x, train_y = mnist.train.next_batch(1)
img = train_x[0,:].reshape(28,28)

plt.figure(figsize=(6,6))
plt.imshow(img,'gray')
plt.title("Label : {}".format(np.argmax(train_y[0,:])))
plt.xticks([])
plt.yticks([])
plt.show()

One hot encoding

InĀ [65]:
print ('Train labels : {}'.format(train_y[0, :]))
Train labels : [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]

6.2.3. Define an ANN Structure¶

  • Input size
  • Hidden layer size
  • The number of classes


InĀ [68]:
n_input = ?
n_hidden = ?
n_output = ?

6.2.4. Define Weights, Biases, and Placeholder¶

  • Define parameters based on predefined layer size
  • Initialize with normal distribution with $\mu = 0$ and $\sigma = 0.1$
InĀ [69]:
weights = {
    'hidden' : ?,
    'output' : ?
}

biases = {
    'hidden' : ?,
    'output' : ?
}
InĀ [70]:
x = tf.placeholder(?)
y = tf.placeholder(?)

6.2.5. Build a Model¶

First, the layer performs several matrix multiplication to produce a set of linear activations



$$y_j = \left(\sum\limits_i \omega_{ij}x_i\right) + b_j$$$$\mathcal{y} = \omega^T \mathcal{x} + \mathcal{b}$$


Second, each linear activation is running through a nonlinear activation function




Third, predict values with an affine transformation



InĀ [71]:
# Define Network
def build_model(x, weights, biases):
    
    # first hidden layer
    hidden = tf.add(tf.matmul(x, weights['hidden']), biases['hidden'])
    # non-linear activate function
    hidden = tf.nn.relu(hidden)
    
    # Output layer 
    output = tf.add(tf.matmul(hidden, weights['output']), biases['output'])
    
    return output

6.2.6. Define Loss and Optimizer¶

Loss

  • This defines how we measure how accurate the model is during training. As was covered in lecture, during training we want to minimize this function, which will "steer" the model in the right direction.
  • Classification: Cross entropy
    • Equivalent to apply logistic regression
$$ -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(h_{\theta}\left(x^{(i)}\right)) + (1-y^{(i)})\log(1-h_{\theta}\left(x^{(i)}\right)) $$

Optimizer

  • This defines how the model is updated based on the data it sees and its loss function.
  • AdamOptimizer: the most popular optimizer
InĀ [72]:
# Define Loss
pred = build_model(x, weights, biases)
loss = tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y)
loss = tf.reduce_mean(loss)

LR = 0.0001
optm = tf.train.AdamOptimizer(LR).minimize(loss)

6.2.7. Define Optimization Configuration and Then Optimize¶




  • Define parameters for training ANN
    • n_batch: batch size for mini-batch gradient descent
    • n_iter: the number of iteration steps
    • n_prt: check loss for every n_prt iteration
  • Metrics
    • Here we can define metrics used to monitor the training and testing steps. In this example, we'll look at the accuracy, the fraction of the images that are correctly classified.

Initializer

  • Initialize all the variables
InĀ [73]:
n_batch = 50     # Batch Size
n_iter = 3000    # Learning Iteration
n_prt = 250      # Print Cycle
InĀ [74]:
# open session
# define init
# run init

?
?
?

# c1 is for train
# c2 is for test

loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
    train_x, train_y = mnist.train.next_batch(n_batch)
    sess.run(optm, feed_dict = {x: train_x, y: train_y}) 
    
    if epoch % n_prt == 0:        
        test_x, test_y = mnist.test.next_batch(n_batch)
        c1 = sess.run(loss, feed_dict = {x: train_x, y: train_y})
        c2 = sess.run(loss, feed_dict = {x: ?, y: ?})
        loss_record_train.append(c1)
        loss_record_test.append(c2)
        print ("Iter : {}".format(epoch))
        print ("Cost : {}".format(c1))
        
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt, 
         loss_record_train, label = 'training')
plt.plot(np.arange(len(loss_record_test))*n_prt, 
         loss_record_test, label = 'testing')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0, np.max(loss_record_train)])
plt.show()
Iter : 0
Cost : 2.5378339290618896
Iter : 250
Cost : 1.4261671304702759
Iter : 500
Cost : 0.6707857251167297
Iter : 750
Cost : 0.4603263735771179
Iter : 1000
Cost : 0.3748474419116974
Iter : 1250
Cost : 0.3407704532146454
Iter : 1500
Cost : 0.3811053931713104
Iter : 1750
Cost : 0.3613077402114868
Iter : 2000
Cost : 0.4578743875026703
Iter : 2250
Cost : 0.1864049881696701
Iter : 2500
Cost : 0.42172595858573914
Iter : 2750
Cost : 0.34328335523605347

6.2.8. Test or Evaluate¶

InĀ [75]:
test_x, test_y = mnist.test.next_batch(100)

my_pred = sess.run(pred, feed_dict = {x : test_x})
my_pred = np.argmax(my_pred, axis = 1)

labels = np.argmax(test_y, axis = 1)

accr = np.mean(np.equal(my_pred, labels))
print("Accuracy : {}%".format(accr*100))
Accuracy : 95.0%
InĀ [78]:
test_x, test_y = mnist.test.next_batch(1)
logits = sess.run(tf.nn.softmax(pred), feed_dict = {x : test_x})
predict = np.argmax(logits)

plt.figure(figsize = (6,6))
plt.imshow(test_x.reshape(28,28), 'gray')
plt.xticks([])
plt.yticks([])
plt.show()

print('Prediction : {}'.format(predict))
np.set_printoptions(precision = 2, suppress = True)
print('Probability : {}'.format(logits.ravel()))
Prediction : 3
Probability : [0.07 0.   0.02 0.85 0.   0.05 0.   0.   0.01 0.  ]

You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of overfitting, when a machine learning model performs worse on new data than on its training data.

What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...

$\Rightarrow$ As we saw in lecture, convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will build a CNN and ultimately output a probability distribution over the 10 digit classes (0-9) in the next lectures.

7. Autoencoder¶

7.1. Unsupervised Learning¶


Definition

  • Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
  • Main task is to find the 'best' representation of the data

Dimension Reduction

  • Attempt to compress as much information as possible in a smaller representation
  • Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

7.2. Autoencoders¶

It is like 'deep learning version' of unsupervised learning.


Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction


Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





  • Autoencoder combines an encoder $f$ from the original space $\mathscr{X}$ to a latent space $\mathscr{F}$, and a decoder $g$ to map back to $\mathscr{X}$, such that $f \circ g$ is [close to] the identity on the data


$$ \mathbb{E} \left[ \lVert X - g \circ f(X) \rVert^2 \right] \approx 0$$



  • A proper autoencoder has to capture a "good" parametrization of the signal, and in particular the statistical dependencies between the signal components.

7.3. Autoencoder with TensorFlow¶

  • MNIST example
  • Use only (1, 5, 6) digits to visualize in 2-D



7.3.1. Import Library¶

InĀ [Ā ]:
# import os
# os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"]="0"
InĀ [79]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

7.3.2. Load MNIST Data¶

InĀ [80]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
  • Only use (1, 5, 6) digits to visualize latent space in 2-D
InĀ [81]:
train_idx = ((np.argmax(mnist.train.labels, 1) == 1) | \
             (np.argmax(mnist.train.labels, 1) == 5) | \
             (np.argmax(mnist.train.labels, 1) == 6))
test_idx = ((np.argmax(mnist.test.labels, 1) == 1) | \
            (np.argmax(mnist.test.labels, 1) == 5) | \
            (np.argmax(mnist.test.labels, 1) == 6))

train_imgs   = mnist.train.images[train_idx]
train_labels = mnist.train.labels[train_idx]
test_imgs    = mnist.test.images[test_idx]
test_labels  = mnist.test.labels[test_idx]
n_train      = train_imgs.shape[0]
n_test       = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 16583, shape : (16583, 784)
The number of testing images : 2985, shape : (2985, 784)

7.3.3. Define a Structure of an Autoencoder¶

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


InĀ [82]:
# Shape of input and latent variable

n_input = ?

# Encoder structure
n_encoder1 = ?
n_encoder2 = ?

n_latent = ?

# Decoder structure
n_decoder2 = ?
n_decoder1 = ?

7.3.4. Define Weights, Biases, and Placeholder¶

  • Define weights and biases for encoder and decoder, separately
  • Based on the pre-defined layer size
  • Initialize with normal distribution of $\mu=0$ and $\sigma=0.1$
InĀ [83]:
weights = {
    'encoder1' : tf.Variable(tf.random_normal([n_input, n_encoder1], stddev = 0.1)),
    'encoder2' : tf.Variable(tf.random_normal([n_encoder1, n_encoder2], stddev = 0.1)),
    'latent' : tf.Variable(tf.random_normal([n_encoder2, n_latent], stddev = 0.1)),
    'decoder2' : tf.Variable(tf.random_normal([n_latent, n_decoder2], stddev = 0.1)),
    'decoder1' : tf.Variable(tf.random_normal([n_decoder2, n_decoder1], stddev = 0.1)),
    'reconst' : tf.Variable(tf.random_normal([n_decoder1, n_input], stddev = 0.1))
}

biases = {
    'encoder1' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'encoder2' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'latent' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'decoder2' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'decoder1' : tf.Variable(tf.random_normal([?], stddev = 0.1)),
    'reconst' : tf.Variable(tf.random_normal([?], stddev = 0.1))
}
InĀ [84]:
x = tf.placeholder(tf.float32, [None, n_input])

7.3.5. Build a Model¶

Encoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • latent is not applied with a nonlinear activation function

Decoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • reconst is not applied with a nonlinear activation function


InĀ [85]:
def encoder(x, weights, biases):
    encoder1 = tf.add(tf.matmul(x, weights['encoder1']), biases['encoder1'])
    encoder1 = tf.nn.tanh(encoder1)
    
    encoder2 = tf.add(tf.matmul(encoder1, weights['encoder2']), biases['encoder2'])
    encoder2 = tf.nn.tanh(encoder2)
    
    latent = tf.add(tf.matmul(encoder2, weights['latent']), biases['latent'])

    return latent
InĀ [86]:
def decoder(latent, weights, biases):
    decoder2 = tf.add(tf.matmul(latent, weights['decoder2']), biases['decoder2'])
    decoder2 = tf.nn.tanh(decoder2)
    
    decoder1 = tf.add(tf.matmul(decoder2, weights['decoder1']), biases['decoder1'])
    decoder1 = tf.nn.tanh(decoder1)
    
    reconst = tf.add(tf.matmul(decoder1, weights['reconst']), biases['reconst'])
   
    return reconst

7.3.6. Define Loss and Optimizer¶

Loss

  • Squared loss
$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

  • AdamOptimizer: the most popular optimizer
InĀ [87]:
LR = 0.0001

latent = encoder(x, weights, biases)
reconst = decoder(latent, weights, biases)
loss = tf.square(tf.subtract(x, reconst))
loss = tf.reduce_mean(loss)

optm = tf.train.AdamOptimizer(LR).minimize(loss)

7.3.7. Define Optimization Configuration and Then Optimize¶



  • Define parameters for training autoencoder
    • n_batch : batch size for mini-batch gradient descent
    • n_iter : the number of iteration steps
    • n_prt : check loss for every n_prt iteration
InĀ [88]:
n_batch = 50
n_iter = 2500
n_prt = 250
InĀ [89]:
def train_batch_maker(batch_size):
    random_idx = np.random.randint(n_train, size = batch_size)
    return train_imgs[random_idx], train_labels[random_idx]
InĀ [90]:
def test_batch_maker(batch_size):
    random_idx = np.random.randint(n_test, size = batch_size)
    return test_imgs[random_idx], test_labels[random_idx]
InĀ [91]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
    train_x, _ = train_batch_maker(n_batch)
    sess.run(optm, feed_dict = {x : train_x})  
    
    if epoch % n_prt == 0:
        test_x, _ = test_batch_maker(n_batch)
        c1 = sess.run(loss, feed_dict = {x: train_x})
        c2 = sess.run(loss, feed_dict = {x: test_x})
        loss_record_train.append(c1)
        loss_record_test.append(c2)
        print ("Iter : {}".format(epoch))
        print ("Cost : {}".format(c1))
        
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt, loss_record_train, label = 'training')
plt.plot(np.arange(len(loss_record_test))*n_prt, loss_record_test, label = 'testing')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0,np.max(loss_record_train)])
plt.show()
Iter : 0
Cost : 0.38241109251976013
Iter : 250
Cost : 0.048531707376241684
Iter : 500
Cost : 0.045364025980234146
Iter : 750
Cost : 0.04366779327392578
Iter : 1000
Cost : 0.04312220960855484
Iter : 1250
Cost : 0.037780530750751495
Iter : 1500
Cost : 0.035730790346860886
Iter : 1750
Cost : 0.03717361018061638
Iter : 2000
Cost : 0.035494640469551086
Iter : 2250
Cost : 0.039774127304553986

7.3.8. Test or Evaluate¶

  • Test reconstruction performance of the autoencoder
InĀ [92]:
test_x, _ = test_batch_maker(1)
x_reconst = sess.run(reconst, feed_dict = {x: test_x})

plt.figure(figsize = (10,8))
plt.subplot(1,2,1)
plt.imshow(test_x.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
  • To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
InĀ [93]:
test_x, test_y = test_batch_maker(500)
test_y = np.argmax(test_y, axis = 1)
test_latent = sess.run(latent, feed_dict = {x: test_x})

plt.figure(figsize = (10,10))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize=15)
plt.xlabel('Z1', fontsize=15)
plt.ylabel('Z2', fontsize=15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

  • It generates something that makes sense.

  • These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.

  • Building a ā€œgoodā€ model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.

  • This is a motivation to VAE or GAN.

InĀ [94]:
new_data = np.array([[-4, 0]])

latent_input = tf.placeholder(tf.float32, [None, n_latent])
reconst = decoder(latent_input, weights, biases)
fake_image = sess.run(reconst, feed_dict = {latent_input: new_data})

plt.figure(figsize=(16,7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

7.4. Visualization¶

Image Generation

  • Select an arbitrary latent varibale $z$
  • Generate images using the learned decoder
InĀ [95]:
# Initialize canvas
nx = 20
ny = 20
x_values = np.linspace(-8, 4, nx)
y_values = np.linspace(-4, 6, ny)
canvas = np.empty((28*ny, 28*nx))

# Define placeholder
latent_input = tf.placeholder(tf.float32, [None, n_latent])
reconst = decoder(latent_input, weights, biases)

for i, yi in enumerate(y_values):
        for j, xi in enumerate(x_values):
            latent_ = np.array([[xi, yi]])
            reconst_ = sess.run(reconst, feed_dict = {latent_input: latent_})
            canvas[(nx-i-1)*28:(nx-i)*28,j*28:(j+1)*28] = reconst_.reshape(28, 28)

plt.figure(figsize = (16, 7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(canvas, 'gray')
plt.title('Manifold', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

7.5. Latent Representation¶

To get an intuition of the latent representation, we can pick two samples š‘„ and š‘„ā€² at random and interpolate samples along the line in the latent space

$$g((1-\alpha)f(x) + \alpha f(x'))$$



  • Interpolation in High Dimension



  • Interpolation in Manifold



InĀ [96]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')